<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Stemming | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'stemming.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/stemming.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/stemming.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/stemming.html" rel="nofollow" target="_blank">../en/stemming.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-link"><a href="analysis-concepts.html">Text analysis concepts</a></span>
»
<span class="breadcrumb-node">Stemming</span>
</div>
<div class="navheader">
<span class="prev">
<a href="analysis-index-search-time.html">« Index and search analysis</a>
</span>
<span class="next">
<a href="token-graphs.html">Token graphs »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="stemming"></a>Stemming<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/stemming.asciidoc">edit</a>
</h2>
</div></div></div>
<p><em>Stemming</em> is the process of reducing a word to its root form. This ensures
variants of a word match during a search.</p>
<p>For example, <code class="literal">walking</code> and <code class="literal">walked</code> can be stemmed to the same root word:
<code class="literal">walk</code>. Once stemmed, an occurrence of either word would match the other in a
search.</p>
<p>Stemming is language-dependent but often involves removing prefixes and
suffixes from words.</p>
<p>In some cases, the root form of a stemmed word may not be a real word. For
example, <code class="literal">jumping</code> and <code class="literal">jumpiness</code> can both be stemmed to <code class="literal">jumpi</code>. While <code class="literal">jumpi</code>
isn’t a real English word, it doesn’t matter for search; if all variants of a
word are reduced to the same root form, they will match correctly.</p>
<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="stemmer-token-filters"></a>Stemmer token filters<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/stemming.asciidoc">edit</a>
</h3>
</div></div></div>
<p>In Elasticsearch, stemming is handled by stemmer <a class="xref" href="analyzer-anatomy.html#analyzer-anatomy-token-filters" title="Token filters">token
filters</a>. These token filters can be categorized based on how they stem words:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="stemming.html#algorithmic-stemmers" title="Algorithmic stemmers">Algorithmic stemmers</a>, which stem words based on a set
of rules
</li>
<li class="listitem">
<a class="xref" href="stemming.html#dictionary-stemmers" title="Dictionary stemmers">Dictionary stemmers</a>, which stem words by looking them
up in a dictionary
</li>
</ul>
</div>
<p>Because stemming changes tokens, we recommend using the same stemmer token
filters during <a class="xref" href="analysis-index-search-time.html" title="Index and search analysis">index and search analysis</a>.</p>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="algorithmic-stemmers"></a>Algorithmic stemmers<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/stemming.asciidoc">edit</a>
</h3>
</div></div></div>
<p>Algorithmic stemmers apply a series of rules to each word to reduce it to its
root form. For example, an algorithmic stemmer for English may remove the <code class="literal">-s</code>
and <code class="literal">-es</code> prefixes from the end of plural words.</p>
<p>Algorithmic stemmers have a few advantages:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
They require little setup and usually work well out of the box.
</li>
<li class="listitem">
They use little memory.
</li>
<li class="listitem">
They are typically faster than <a class="xref" href="stemming.html#dictionary-stemmers" title="Dictionary stemmers">dictionary stemmers</a>.
</li>
</ul>
</div>
<p>However, most algorithmic stemmers only alter the existing text of a word. This
means they may not work well with irregular words that don’t contain their root
form, such as:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<code class="literal">be</code>, <code class="literal">are</code>, and <code class="literal">am</code>
</li>
<li class="listitem">
<code class="literal">mouse</code> and <code class="literal">mice</code>
</li>
<li class="listitem">
<code class="literal">foot</code> and <code class="literal">feet</code>
</li>
</ul>
</div>
<p>The following token filters use algorithmic stemming:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-stemmer-tokenfilter.html" title="Stemmer token filter"><code class="literal">stemmer</code></a>, which provides algorithmic
stemming for several languages, some with additional variants.
</li>
<li class="listitem">
<a class="xref" href="analysis-kstem-tokenfilter.html" title="KStem token filter"><code class="literal">kstem</code></a>, a stemmer for English that combines
algorithmic stemming with a built-in dictionary.
</li>
<li class="listitem">
<a class="xref" href="analysis-porterstem-tokenfilter.html" title="Porter stem token filter"><code class="literal">porter_stem</code></a>, our recommended algorithmic
stemmer for English.
</li>
<li class="listitem">
<a class="xref" href="analysis-snowball-tokenfilter.html" title="Snowball token filter"><code class="literal">snowball</code></a>, which uses
<a href="http://snowball.tartarus.org/" class="ulink" target="_top">Snowball</a>-based stemming rules for several
languages.
</li>
</ul>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="dictionary-stemmers"></a>Dictionary stemmers<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/stemming.asciidoc">edit</a>
</h3>
</div></div></div>
<p>Dictionary stemmers look up words in a provided dictionary, replacing unstemmed
word variants with stemmed words from the dictionary.</p>
<p>In theory, dictionary stemmers are well suited for:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
Stemming irregular words
</li>
<li class="listitem">
<p>Discerning between words that are spelled similarly but not related
conceptually, such as:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<code class="literal">organ</code> and <code class="literal">organization</code>
</li>
<li class="listitem">
<code class="literal">broker</code> and <code class="literal">broken</code>
</li>
</ul>
</div>
</li>
</ul>
</div>
<p>In practice, algorithmic stemmers typically outperform dictionary stemmers. This
is because dictionary stemmers have the following disadvantages:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<span class="strong strong"><strong>Dictionary quality</strong></span><br>
A dictionary stemmer is only as good as its dictionary. To work well, these
dictionaries must include a significant number of words, be updated regularly,
and change with language trends. Often, by the time a dictionary has been made
available, it’s incomplete and some of its entries are already outdated.
</li>
<li class="listitem">
<span class="strong strong"><strong>Size and performance</strong></span><br>
Dictionary stemmers must load all words, prefixes, and suffixes from its
dictionary into memory. This can use a significant amount of RAM. Low-quality
dictionaries may also be less efficient with prefix and suffix removal, which
can slow the stemming process significantly.
</li>
</ul>
</div>
<p>You can use the <a class="xref" href="analysis-hunspell-tokenfilter.html" title="Hunspell token filter"><code class="literal">hunspell</code></a> token filter to
perform dictionary stemming.</p>
<div class="tip admon">
<div class="icon"></div>
<div class="admon_content">
<p>If available, we recommend trying an algorithmic stemmer for your language
before using the <a class="xref" href="analysis-hunspell-tokenfilter.html" title="Hunspell token filter"><code class="literal">hunspell</code></a> token filter.</p>
</div>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="control-stemming"></a>Control stemming<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/stemming.asciidoc">edit</a>
</h3>
</div></div></div>
<p>Sometimes stemming can produce shared root words that are spelled similarly but
not related conceptually. For example, a stemmer may reduce both <code class="literal">skies</code> and
<code class="literal">skiing</code> to the same root word: <code class="literal">ski</code>.</p>
<p>To prevent this and better control stemming, you can use the following token
filters:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<a class="xref" href="analysis-stemmer-override-tokenfilter.html" title="Stemmer override token filter"><code class="literal">stemmer_override</code></a>, which lets you
define rules for stemming specific tokens.
</li>
<li class="listitem">
<a class="xref" href="analysis-keyword-marker-tokenfilter.html" title="Keyword marker token filter"><code class="literal">keyword_marker</code></a>, which marks
specified tokens as keywords. Keyword tokens are not stemmed by subsequent
stemmer token filters.
</li>
<li class="listitem">
<a class="xref" href="analysis-condition-tokenfilter.html" title="Conditional token filter"><code class="literal">conditional</code></a>, which can be used to mark
tokens as keywords, similar to the <code class="literal">keyword_marker</code> filter.
</li>
</ul>
</div>
<p>For built-in <a class="xref" href="analysis-lang-analyzer.html" title="Language Analyzers">language analyzers</a>, you also can use the
<a class="xref" href="analysis-lang-analyzer.html#_excluding_words_from_stemming" title="Excluding words from stemming"><code class="literal">stem_exclusion</code></a> parameter to specify a list
of words that won’t be stemmed.</p>
</div>

</div>
<div class="navfooter">
<span class="prev">
<a href="analysis-index-search-time.html">« Index and search analysis</a>
</span>
<span class="next">
<a href="token-graphs.html">Token graphs »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>